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We introduce an ambidextrous view of stochastic dynamical systems, comparing their forward- 
time and reverse-time representations and then integrating them into a single time-symmetric rep- 
resentation. The perspective is useful theoretically, computationally, and conceptually. Mathemati- 
cally, we prove that the excess entropy — a familiar measure of organization in complex systems — is 
the mutual information not only between the past and future, but also between the predictive and 
retrodictive causal states. Practically, we exploit the connection between prediction and retrodic- 
tion to directly calculate the excess entropy. Conceptually, these lead one to discover new system 
invariants for stochastic dynamical systems: crypticity (information accessibility) and causal irre- 
versibility. Ultimately, we introduce a time-symmetric representation that unifies all these quantities, 
compressing the two directional representations into one. The resulting compression offers a new 
conception of the amount of information stored in the present. 
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INTRODUCTION 

"Predicting time series" encapsulates two notions of 
directionality. Prediction — making a claim about the 
future based on the past — is directional. Time evokes 
images of rivers, clocks, and actions in progress. Curi- 
ously, though, when one writes a time series as a lattice 
of random variables, any necessary dependence on time's 
inherent direction is removed; at best it becomes conven- 
tion. When we analyze a stochastic process to determine 
its correlation function, block entropy, entropy rate, and 
the like, we already have shed our commitment to the 
idea of forward by virtue of the fact that these quantities 
are defined independently of any perceived direction of 
the process. 

Here we explore this ambivalence. In making it ex- 
plicit, we consider not only predictive models, but also 
retrodictive models. We then demonstrate that it is pos- 
sible to unily these two viewpoints and, in doing so, we 
discover several new properties of stationary stochastic 
dynamical systems. Along the way, we also rediscover, 
and recast, old ones. 

We first review minimal causal representations of 
stochastic processes, as developed by computational me- 
chanics [H, Q ■ We extend its (implied) forward-time rep- 
resentation to reverse-time. Then, we prove that the mu- 
tual information between a process's past and future — 
the excess entropy — is the mutual information between 
its forward- and reverse-time representations. 

Excess entropy, and related mutual information quan- 
tities, are widely used diagnostics for complex systems. 
They have been applied to detect thepresence of orga- 
nization in dynamical systems [1, 0, H S| , in spin sys- 
tems [7|, |8|, |9(, in neurobiological systems 0, and 



even in language, to mention only a few applications. 
For example, in natural language the excess entropy (E) 
diverges with the number of characters L as E oc L^/^. 
The claim is that this reflects the long-range and strongly 
nonergodic organization necessary for human communi- 
cation [i3,[i3|. 

The net result is a unified view of information process- 
ing in stochastic processes. For the first time, wc give an 
explicit relationship between the internal (causal) state 
information — the statistical complexity [ij — and the ob- 
served information — the excess entropy. Another conse- 
quence is that the forward and reverse representations 
are two projections of a unified time-symmetric repre- 
sentation. From the latter it becomes clear there are im- 
portant system invariants that control how accessible in- 
ternal state information is and how irreversible a process 
is. Moreover, the methods are sufficiently constructive 
that one can calculate the excess entropy in closed-form 
for finite-memory processes. 



Before embarking, we refer the reader to Ref. IJ] for 
complementary results, that we do not cover here, on the 
measure-theoretic relationships between the above infor- 
mation quantities. The announcement of those results 
and those in the present work appeared in Ref. [lij . Here 
we lay out the theory in detail, giving step-by-step proofs 
of the main results and the calculational methods. 



OPTIMAL CAUSAL MODELS 

Our approach starts with a simple analogy. Any pro- 
cess Pr(A,A) is also a communication channel with a 
specified input distribution Pr(A) (s^ : It transmits in- 
formation from the past X = ...A_3A_2A_i to the 
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future X = X0X1X2 ■ ■ • by storing it in the present. 
Xt is the random variable for the measurement outcome 
at time t. Our goal is also simply stated: We wish to 
predict the future using information from the past. At 
root, a prediction is probabilistic, specified by a distri- 
bution of possible futures X given a particular past 
Pr(X|V). At a minimum, a good predictor needs to 
capture all of the information / shared between past and 
future: E = — the process's excess entropy [Til , 

and references therein]. 

Consider now the goal of modeling — building a repre- 
sentation that allows not only good prediction but also 
expresses the mechanisms producing a system's behav- 
ior. To build a model of a structured process (a mem- 
oryful channel), computational mechanics [J introduced 
an equivalence relation x ^ x that groups all histories 
which give rise to the same prediction: 



eC^) ^ {V : Pr(X\lE) = Pr(X|'F')} . 



(1) 



In other words, for the purpose of forecasting the fu- 
ture, two different pasts arc equivalent if they result in 
the same prediction. The result of applying this equiva- 
lence gives the process's causal states S = Pr(A", X)/ ~, 
which partition the space X of pasts into sets that are 
predictively equivalent. The set of causal states [s^l can 
be discrete, fractal or continuous; see, e.g.. Figs. 7, 8, 
10, and 17 in Ref. jl^. 

(x) 

State-to-state transitions are denoted by matrices Tt^J, 
whose elements give the probability Pr{X = a;,iS'|5) of 
transitioning from one state S to the next S' on see- 
ing measurement x. The resulting model, consisting of 
the causal states and transitions, is called the process's 
e-machine. Given a process V, we denote its e-machine 
by M{P). 

Causal states have a Markovian property that they ren- 
der the past and future statistically independent; they 
shield the future from the past [3]: 

Pr(X,X\S) = Pi{X\S) Pr(X|5) . (2) 

Moreover, they arc optimally predictive [l[ in the sense 
that knowing which causal state a process is in is just as 
good as having the entire past: Pi{X\S) — Pr{X\X). In 
other words, causal shielding is equivalent to the fact 01 
that the causal states capture all of the information 
shared between past and future: I[S; X] = E. 

e-Machines have an important, if subtle, structural 
property called unifilarity P. From the start state, 
each observed sequence . . . X-3X-2X-1 . . . corresponds 
to one and only one sequence of causal states (s^ . 
e-Machine unifiliarity underlies many of the results here. 
Its importance is reflected in the fact that representations 
without unifilarity, such as general hidden Markov mod- 
els, cannot be used to directly calculate important sys- 
tem properties — including the most basic, such as, how 



random a process is. Nonetheless, unifilarity is easy to 
verify: For each state, each measurement symbol appears 
on at most one outgoing transition (ssj . The signature 
of unifilarity is that on knowing the current state and 
measurement, the uncertainty in the next state vanishes: 
H[St+i\St, Xt] =0. In summary, a process's e-machine 
is its unique minimal unifilar model. 



INFORMATION PROCESSING INVARIANTS 

Out of all optimally predictive models TZ — for which 
I[R; X] = E — the e-machine captures the minimal 
amount of information that a process must store in or- 
der to communicate all of the excess entropy from the 
past to the future. This is the Shannon information con- 
tained in the causal states — the statistical complexity 0] : 
= H[S] < H[TZ]. In short, E is the effective informa- 
tion transmission rate of the process, viewed as a channel, 
and C^j is the sophistication of that channel. 

Combined, these properties mean that the e-machine 
is the basis against which modeling should be compared, 
since it captures all of a process's information at maxi- 
mum representational efficiency. 

In addition to E and C^, another key (and histori- 
cally prior) invariant for dynamical systems and stochas- 
tic processes is the entropy rate: 



lim 



H{L) 



(3) 



where H{L) is Shannon entropy of length-L sequences 
X^. This is the per-measuremcnt rate at which the pro- 
cess generates information — its degree of intrinsic ran- 
domness 

Importantly, due to unifilarity one can calculate the 
entropy rate directly from a process's e-machine: 



/V = H[X\S] 



J2Pr{S)J2T^ilog,T, 



{x} 



Ax) 
SS' 



(4) 



Pr(iS) is the asymptotic probability of the causal states, 
which is obtained as the normalized principal eigenvector 
of the transition matrix T = "^2 {x} '^^^'^ ■ ^^^^ 
to denote the distribution over the causal states as a row 
vector. Note that a process's statistical complexity can 
also be directly calculated from its e-machine: 



= H[S] 

= -5]Pr(5)log2Pr(5) 

{5} 



(5) 



Thus, the e-machine directly gives two important invari- 
ants: a process's rate (/i^) of producing information and 
the amount (C^) of historical information it stores in do- 
ing so. 
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EXCESS ENTROPY 

Until recently, E could not be as directly calculated 
as the entropy rate and the statistical complexity. This 
state of affairs was a major roadblock to analyzing the 
relationships between modeling and predicting and, more 
concretely, the relationships between (and even the inter- 
pretation of) a process's basic invariants — /i^, C^, and 
E. Ref. announced the solution to this longstanding 
problem by deriving explicit expressions for E in terms of 
the e-machine, providing a unified information-theoretic 
analysis of general processes. Here we provide a detailed 
account of the underlying methods and results. 

To get started, we should recall what is already known 
about the relationships between these various quanti- 
ties. First, some time ago, an explicit expression was 
developed from the Hamiltonian for one-dimensional spin 
chains with range- i? interactions [8] : 



E — — i? /i^ 



(6) 



It was demonstrated that E is a generalized order param- 
eter: Compared to structure factors, E is an assumption- 
free way to find structure and correlation in spin systems 
that docs not require tuning 

Second, it has also been known for some time that the 
statistical complexity is an upper bound on the excess 
entropy [3]: 



E<C, 



(7) 



Nonetheless, other than the special, if useful, case of spin 
systems, until Ref. [3] there had been no direct way to 
calculate E. Remedying this limitation required broad- 
ening the notion of what a process is. 



RETRODICTION 

The original results of computational mechanics con- 
cern using the past to predict the future. But we can 
also retrodict: use the future to predict the past. That 
is, we scan the measurement variables not in the forward 
time direction, but in the reverse. The computational 
mechanics formalism is essentially unchanged, though its 
meaning and notation need to be augmented [21 1 . 

With this in mind, the previous mapping from pasts 
to causal states is now denoted and it gave, what we 
will call, the predictive causal states . When scan- 
ning in the reverse direction, we have a new relation. 



X ^ x', which groups futures that are equivalent for 
the purpose of retrodicting the past: e~{x) = {x' : 

Pr{X\~x) = Pr{X\~x')}. It gives the retrodictive causal 
states S = Pr(X,X)/ And, not surprisingly, we 
must also distinguish the forward-scan e-machine M"*" 
from the reverse-scan e-machine M~ . They assign corre- 



sponding entropy rates, /i+ and h~, and statistical com- 
plexities, C+ = ffi^+J and C~ = H[S~], respectively, to 
the process. 

To orient ourselves, a graphical aid, the hidden process 
lattice, is helpful at this point; see Table [J 



Past 


Present 


Future 








X 




X 


X- 


-3 X- 


-2 


X-i 




Xo Xi X2 


...su 


si. 




1 


St 


Si S2 Sg. . . 


...SI, 


SI2 


5: 


1 


So 


Si S2 s,^ . . . 



TABLE I: Hidden Process Lattice: The X variables denote 
the observed process; the S variables, the hidden states. If one 
scans the observed variables in the positive direction — seeing 
X-3, X~-2, and X-i — then that history takes one to causal 
state St ■ Analogously, if one scans in the reverse direction, 
then the succession of variables X2, Xi, and Xo leads to Sq . 

Now we are in a position to ask some questions. Per- 
haps the most obvious is, In which time direction is a 
process most predictable? The answer is that a process 
is equally predictable in either: 

Proposition 1. For a stationary process, optimally 
predicting the future and optimally retrodicting the past 
are equally effective: h~ =h'^. 

Proof. A stationary stochastic process satisfies: 

H[X^L+2, ■ ■ ■ , Xq] = H[X^L+i, . . . , . (8) 

Keeping this in mind, we directly calculate: 



h+ = H[Xo\X] 








= hm H[Xo\X_L, 

L — >-oo 


-1,...,^- 


1] 




= lim {H[X.L+i, 

L—i^oo 


...,Xo]- 


H\X^ 


L+l, • • ■ 


= lim iH[X^L+i, 


...,Xo]- 


H\X^ 


L+2, ■ ■ ■ , Xq]) 


= lim {H[X^i,... 

L — >oc 


, Xl-2] — 


H[Xo 


, ■ • ■ , Xl^2]) 


= lim H[X.i\Xo, 

L— i-oo 


■ ■ ■ , Xl-2 






= H[X^i\X] 








= V • ° 









Somewhat surprisingly, the effort involved in optimally 
predicting and retrodicting is not necessarily the same: 

Proposition 2. Fiji / There exist stationary processes for 
which C,7 ^C+. 

Proof. The random-insertion process, analyzed in a 
later section, establishes this by example. 

Note that E is mute on this score. Since the mutual 
information / is symmetric in its variables [l^l , E is time 
symmetric. Proposition [2] puts us on notice that E nec- 
essarily misses many of a process's structural properties. 
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EXCESS ENTROPY FROM CAUSAL STATES 

The relationship between predicting and retrodicting 
a process, and ultimately E's role, requires teasing out 
how the states of the forward and reverse e-machines cap- 
ture information from the past and the future. To do 
this we analyzed a four-variable mutual information: 
I[X; X;S~^;S~]. A large number of expansions of this 
quantity are possible. A systematic development follows 
from Ref. [23| which showed that Shannon entropy H[-] 
and mutual information /[•; •] form a signed measure over 
the space of events. Practically, there is a direct cor- 
respondence between set theory and these information 
measures. Using this, Ref. [l^ developed an e-machine 
information diagram over four variables, which gives a 
minimal set of entropies, conditional entropies, mutual 
informations, and conditional mutual informations nec- 
essary to analyze the relationships among ft,^ , , and E 
for general stochastic processes. 

In a generic four-variable information diagram, there 
are 15 independent variables. Fortunately, this greatly 
simplifies in the case of using an e-machine to represent 
a process; there arc only 5 independent variables in the 
e-machine information diagram [l4| . (These results are 
announced in [31; see Fig. 1 there.) 

Simplified in this way, we are left with our main re- 
sults which, due to the preceding effort, are particularly 
transparent. 

Theorem 1. Excess entropy is the mutual information 
between the predictive and retrodictive causal states: 

E = I[S+;S-]. (9) 

Proof. This follows due to the redundancy of pasts and 
predictive causal states, on the one hand, and of futures 
and retrodictive causal states, on the other. These re- 
dundancies, in turn, are expressed via ~ e^(Ar) and 
S~ = e~{X), respectively. That is, we have 

I[X;X;S+;S-]^I[X;X] 

= E , (10) 

on the one hand, and 

I[X;X;S+:S-] = I[S+;S-] , (11) 

on the other. □ 

That is, the process's effective channel capacity 
E = I[X; X] is the same as that of a "channel" between 
the forward and reverse e-machine states. 

Proposition 3. The predictive and retrodictive statisti- 
cal complexities are: 

C+ = E + H[S+\S-] &nd (12) 
^E + H[S-\S+] . (13) 



Proof. E = /[5+;5-] = H[S+] - H[S+\S-]. Since 
the first term is C'^ , we have the predictive statistical 
complexity. Similarly for the retrodictive complexity. □ 

Corollary 1. C+ > H[S+\S-] and > 

Proof. E > 0. 

The Theorem and its companion Proposition give an 
explicit connection between a process's excess entropy 
and its causal structure — its e-machines. More generally, 
the relationships directly tie mutual information mea- 
sures of observed sequences to a process's internal struc- 
ture. This is our main result. It allows us to probe the 
properties that control how closely observed statistics re- 
flect a process's hidden organization. However, this re- 
quires that we understand how M'^ and A/~ are related. 
We express this relationship with a unifying model — the 
bidirectional machine. 



THE BIDIRECTIONAL MACHINE 

At this point, we have two separate e-machines — one 
for predicting (M"*") and one for retrodicting (M~). We 
will now show that one can do better, by simultaneously 
utilizing causal information from the past and future. 

Definition. Let denote the bidirectional machine 
given by the equivalence relation ^3^]: 

e^(*x*) = e*(ar,~a?) 

= {C^r', "af' ) : af' G e~'"('ar) and G e~(l?)} 

with causal states = Pr(X)/~^. 

That is, the bidirectional causal states are a partition 
of X : C X S . This follows from a straight- 
forward adaptation of the analogous result for forward 
e-machines [2|. 

To illustrate, imagine being given a particular realiza- 
tion . In effect, the bidirectional machine de- 
scribes how one can move around on the hidden process 
lattice of Table m 

1 . When scanning in the forward direction, states and 
transitions associated with A/+ are followed. 

2. When scanning in the reverse direction, states and 
transitions associated with M~ are followed. 

3. At any time, one can change to the opposite scan 
direction, moving to the state of the opposite scan's 
e-machine. For example, if one moves forward fol- 
lowing and ends in state 5+, having seen x 
and about to see x . then one moves to = 
e-(l?). 
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At time t, the bidirectional causal state is = 
(e^( a; t), X' t)). When scanning in the forward direc- 
tion, the first symbol of x t is removed and appended to 
X f When scanning in the reverse direction, the last sym- 
bol in is removed and prefixed to Hct. In either sit- 
uation, the new bidirectional causal state is determined 
by and the updated past and future. 

This illustrates the relationship between 5+ and , 
as specified by M^, when given a particular realization. 
Generally, though, one considers an ensemble X of real- 
izations. In this case, the bidirectional state transitions 
are probabilistic and possibly nonunifilar. This relation- 
ship can be made more explicit through the use of maps 
between the forward and reverse causal states. These are 
the switching maps. 

The forward map is a linear function from the sim- 
plex over S~ to the simplex over , and analogously 
for the reverse map. The maps are defined in terms of 
conditional probability distributions: 

1. The forward map f : A" A™, where = 
Pr(5+ 1 cr-); and 

2. The reverse map r : A™ — > A", where r((T+) = 
Pr(5-|CT+), 

where n = |«5 | and m = \S^\. 

We will sometimes refer to these maps in the Boolean 
rather than probabilistic sense. The case will be clear 
from context. 

Proposition 4. r and f are onto. 

Proof. Consider the reverse map r that takes one from 
a forward causal state to a reverse causal state. Assume 
r is not onto. Then there must he a reverse state a~ that 
is not in the range of r{S^). This means that no forward 
causal state is paired with a~ and so there is no past x 
with a possible future ~x G cr~ . That is, e^{^,~x) = 
and, specifically, e~{x) = 9. Thus, a~ does not exist. 
A similar argument shows that f is onto. □ 

Definition. The amount of stored information needed to 
optimally predict and retrodict a process is 's statis- 
tical complexity: 

= H[S^] = H[S+,S-] . (14) 

From the immediately preceding results we obtain the 
following simple, explicit, and useful relationship: 

Corollary 2. E = C+ + - C±. 

Thus, we are led to a wholly new interpretation of 
the excess entropy — in addition to the original three 
discussed in Ref. [l3|: E is exactly the difference be- 
tween these structural complexities. Moreover, only 
when E = does = C+ + C' . 

More to the point, thinking of the C^s as proportional 
to the size of the corresponding machine, we establish the 
representational efficiency of the bidirectional machine: 



Proposition 5. < C+ + C' . 

Proof. This follows directly from the preceding corollary 
and the nonnegativity of mutual information. □ 

We can say a bit more, with the following bounds. 

Corollary 3. C+ < C± and C~ < C±. 

These results say that taking into account causal in- 
formation from the past and the future is more efficient 
(i) than ignoring one or the other and (ii) than ignoring 
their relationship. 

Upper Bounds 

Here we give new, tighter bounds for E than Eq. 
([7]) and greatly simplified proofs than those provided in 
Refs. and [13. 

Proposition 6. For a stationary process, E < and 
E<C-. 

Proof. These bounds follow directly from applying basic 
information inequalities: I[X,Y] < H[X] and I[X,Y] < 
H[Y]. Thus, E = < H[S-], which is C". 

Similarly, since /[iS~;iS+] < _ff[iS+], we have E < C^. 

□ 

Causal Irreversibility 

We have shown that predicting and retrodicting 
may require different amounts of information storage 
(C+ ^ C^). We now examine this asymmetry. 

Given a word w = xoX2 . . ■ xl-i, the word we see when 
scanning in the reverse direction is {S = x^^i . . . xiXq, 
where xl-i is encountered first and xq is encountered 
last. 

Definition. A microscopically reversible process is one 
for which Pr(w) = Pr{{v), for all words w = x^ and all 
L. 

Microscopic reversibility simply means that flipping 
t ^ —t leads to the same process Pr(X,X). A micro- 
scopically reversible process scanned in both directions 
yields the same word distribution; we will denote this 

= v-. 

Proposition 7. A microscopically reversible process has 
M- = M+. 

Proof. //'P+ = V~, then M{V+) = M(V~) since Mis 
a function. And these are and M~ , respectively. □ 

Corollary 4. For a microscopically reversible process, 
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Proof. For a microscopically reversible process M~ = 
. And so, in particular, S = , their transition 
matrices are the same, and so Pr(iS^) = Pr(5+). Thus. 
C^^C+. □ 

Now consider a slightly looser, and more helpful, no- 
tion of reversibility, expressed quantitatively as a mea- 
sure of irreversibility. 



Definition. A process 's causal irreversibility f2M l is: 



EiP) = 



(15) 



Corollary 5. E{V) = H[S+\S-] - H[S~\S+]. 



Note that S = docs not imply that M+ = . For 
example, the periodic process . . . 123123123 ... is not mi- 
croscopically reversible, since Pr(123) ^ Pr(321). How- 
ever, S = 0, as C~ = C+ = log2 3. 

It turns out, though, that we are more interested in 
the following situation. 

Proposition 8. // 2(7^) 7^ 0, then the process is not 
microscopically reversible. 

Proof. C+ ^ implies that M+ ^ M". And so, 

v+^v-. □ 

So, a vanishing S will indicate "reversibility" for some 
classes of processes that are not microscopically re- 
versible. The periodic process just described is one such 
example. In fact, this includes any process whose left- 
and right-scan processes are isomorphic under a simul- 
taneous measurement-alphabet and causal-state isomor- 
phism. Given that the spirit of symbolic dynamics is to 
consider processes only up to isomorphism, this measure 
seems to capture a very natural notion of irreversibil- 
ity. Interestingly, it appears, based on several case stud- 
ies, that causal reversibility captures exactly that notion. 
That is, it would seem there are no processes for which 
S = 0, yet P+ V~ . We leave this as a conjecture. 

Finally, note that causal irreversibility is not controlled 
by E, since, as noted above, the latter is scan-symmetric. 



Process Crypticity 

Lurking in the preceding development and results is an 
alternative view of how forecasting and modeling building 
are related. 

We can extend our use of Shannon's communication 
theory (processes are memoryful channels) to view the 
activity of an observer building a model of a process as 
the attempt to decrypt from a measurement sequence 
the hidden state information [2J|. The parallel we draw 
is that the design goal of cryptography is to not reveal 
internal correlations and structure within an encrypted 
data stream, even though in fact there is a message — 
hidden organization and structure — that will be revealed 



to a recipient with the correct codebook. This is essen- 
tially the circumstance a scientist faces when building a 
model, for the first time, from measurements: What are 
the states and dynamic (hidden message) in the observed 
data? 

Here, we address only the case of self-decoding in which 
the information used to build a model is only that avail- 
able in the observed process Pr(X). That is, no "side- 
band" communication, prior knowledge, or disciplinary 
assumptions are allowed. Note, though, that modeling 
with such additional knowledge requires solving the self- 
decoding case, addressed here, first. The self-decoding 
approach to building nonlinear models from time series 
was introduced in Ref. [2^. 

The relationship between excess entropy and statistical 
complexity established by Thm. [T] indicates that there 
are fundamental limitations on the amount of a process's 
stored information directly present in observations, as 
reflected in the mutual information measure E. We now 
introduce a measure of this accessibility. 



Definition. A process's crypticity is: 

X{M+ = H\S+\S-]+ H[S-\S+] 



(16) 



Proposition 9. xi^^^T^'^ ) ^he difitance between a 
process's forward and reverse e-machines. 

Proof. x(M"'",M~) is nonnegative, symmetric, and sat- 
isfies a triangle inequality. These follow from the solution 
of exercise 2.9 of Ref fM]- See also, Ref fM]- □ 

Theorem 2. AI^ 's statistical complexity is: 



(17) 



Proof. This follows directly from the corollary and the 
predictive and retrodictive statistical complexity relations. 
Prop. and H^. □ 



Referring to x ^ crypticity comes directly from this re- 
sult: It is the amount of internal state information (C^) 
not locally present in the observed sequence (E). That 
is, a process hides x bits of information. 

Note that if crypticity is low x w 0, then much of 
the stored information is present in observed behavior: 



E 



. However, when a process's crypticity is high, 
, then little of it's structural information is di- 
rectly present in observations. The measurements appear 
very close to being independent, identically distributed 
(E K, 0) despite the fact that the process can be highly 
structured (C^ > 0). 

Corollary 6. 's statistical complexity bounds the 
process 's crypticity: 



Proof. E > 0. 



(18) 
□ 
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Thus, a truly cryptic process has = x or, cquiva- 
lently, E = 0. In this circumstance, httle or nothing can 
be learned about the process's hidden organization from 
measurements. This would be perfect encryption. 

We will find it useful to discuss the two contribu- 
tions to X separately. Denote these x^ = iJ[5+|iS^] and 
X-=H[S-\S+]. 

The preceding results can be compactly summarized 
in an information diagram that uses the e-machine rep- 
resentation of a process; see Ref. 15 1 and Ref. [31 ■ They 
also lead to a new classification scheme for stationary 
processes; see Ref. [13]. In the following, wc concentrate 
instead on how to calculate the preceding quantities, giv- 
ing a complete informational and structural analysis of 
general processes. 

ALTERNATIVE PRESENTATIONS 

The e-machine is a process's unique, minimal unifilar 
presentation. Now wc introduce two alternative presen- 
tations, which need not be e-machines, that will be used 
in the calculation of E. Since the states of these alterna- 
tive presentations are not causal states, we will use TZt, 
rather than St, to denote the random variable for their 
state at time t. 



Time-Reversed Presentation 

Any machine M transitions from the current state TZ 
to the next state TZ' on the current symbol x: 



T^l, = Py{X = x,n'\TZ) . 



(19) 



Note that T = J2{x} T^""' 

is a stochastic matrix with 
principal eigenvalue 1 and left eigenvector tt, which gives 
Pr(7^). Recall that the Perron- Frobenius theorem ap- 
plied to stochastic matrices guarantees the uniqueness of 

TT. 

Using standard probability rules to interchange TZ and 
TZ' , we can construct a new set of transition matrices 
which defines a presentation of the process that generates 
the symbols in reverse order. It is useful to consider a 
time-reversing operator acting on a machine. Denoting 
it T, M = T{M) is the time-reversed presentation of M. 
It has symbol-labeled transition matrices: 



(x) 



Pr(X = x,TZ\TZ') 



' Pr(7^') 



(20) 



and stochastic matrix T = "^i^^} 

Proposition 10. The stationary distribution tt over the 
time-reversed presentation states is the same as the sta- 
tionary distribution tt of M . 



Proof. We assume tt = it, the left eigenvector ofT, and 
verify the assumption, recalling the uniqueness of it. We 
have: 

p' 

= ,Tpp,-f- 

, ^p' 
p' '^ 

p' 

= TTp . □ 

In the second to last line, we recall the assumption -Kp' = 
■Kpi . And in the final, we note that T is stochastic. □ 

Finally, when we consider the product of transition 
matrices over a given sequence w, it is useful to simplify 
notation as follows: 

rp(w) _ rp(xo)rp(xi) ^ ^ rp{xL~l) 



Mixed-State Presentation 

The states of machine M can be treated as a stan- 
dard basis in a vector space. Then, any distribution over 
these states is a linear combination of those basis vectors. 
Following Ref. [2^, these distributions are called mixed 
states. 

Now we focus on a special subset of mixed states and 
define fJ,{w) as the distribution over the states of M that 
is induced after observing w. 

/i(u.) = Pr(7^L|Xo^ = H 
_ PriX^ =w,TZl) 



Pr(Xo^ 



w) 



(21) 
(22) 



f ^ , (23) 

where Xq is shorthand for an undetermined sequence 
of L measurements beginning at time t = and 1 is 
a column vector of Is. In the last line, we write the 
probabilities in terms of the stationary distribution and 
the transition matrices of M. This expansion is valid for 
any machine that generates the process in the forward- 
scan (left-to-right) direction. 

If we consider the entire set of such mixed states, then 
we can construct a presentation of the process by speci- 
fying the transition matrices: 

Pt(wx) 



Pt(x, ii{wx)\fj,{w)) 



Pr(u;) 
ti{w)T'-'- 



(24) 
(25) 



Note that many words can induce the same mixed state. 
As with the time-reversed presentation, it will be useful 
to define a corresponding operator lA that acts on a ma- 
chine M, returning its mixed-state presentation IA{M). 
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CALCULATING EXCESS ENTROPY 

We are now ready to describe how to calculate the ex- 
cess entropy, using the time-symmetric perspective. Gen- 
erally, our goal is to obtain a conditional distribution 
Pr(iS'''|5~) which, when combined with the e-machines, 
yields a direct calculation of E via Thm. [TJ This is a two- 
step procedure which begins with M"*", calculates M+, 
and ends with . One could also start with M~ to 
obtain . These possibilities are captured in the dia- 
gram: 



r 



M- 



(26) 



In detail, we begin with and reverse the direction 
o^time by constructing the time-reversed presentation 
A/+ = T(7\/+). Then, we construct the mixed-state pre- 
sentation iY(A/+) of the time-reversed presentation to ob- 
tain M~. 

Note that T acting on A/+ docs not gencrically yield 
another e-machine. (This was not the purpose of T.) 
However, the states will still be useful when we construct 
the mixed-state presentation of This is because 

the states, which serve as basis states in the mixed-state 
presentation, are in a one-to-one correspondence with the 
forward causal states of M+. This correspondence was 
established by Prop. [TOl 

Also, note that lA is not guaranteed to construct a min- 
imal presentation of the process. However, this does not 
appear to be an issue when working with time-reversed 
presentations of an e-machine. We leave it as a conjecture 
that U{T{M)) is always minimal. Even so, the Appendix 
demonstrates that an appropriate sum can be carried out 
which always yields the desired conditional distribution. 

Returning to the two-step proccdurc^^ one must con- 
struct the mixed-state presentation of Af+. It is helpful 
to keep the hidden process lattice of Table [J in mind. 
Since A/+ generates the process from riglit-to-left, it en- 
counters symbols of w in reverse order. The consequence 
of this is that the form of the mixed state changes slightly. 
However, it still represents the distribution over the cur- 
rent state induced by seeing w. We denote this new form 
by iy{w): 



iy{w) = Pr(no\X^ = w) 



w) 



(27) 
(28) 

(29) 



from right-to-left, respectively. In this procedure, we are 
making use of and thus, tt and T. 

Similarly, if we consider the entire set of such mixed 
states, we can construct a presentation of the process by 
specifying the transition matrices: 



Pr(a;, i'(xw)\i'{w)) 



Pi{xw) 
Piiw) 
j^(m;)T(^)i. 



(30) 
(31) 



Focusing again on M+, we construct M+ = T(M+). 
Since n = n, we can equate TZt = and the mixed 
states i^^w) are actually informing us about the causal 
states in M+: 

u{w)^Prino\X^ = w) 
^Py{S+\Xl^^w) . 

Whenever the mixed-state presentation is an e-machine, 
each distribution corresponds to exactly one reverse 
causal state. Thus, if w induces i^{w), then v{w) is the 
reverse causal state induced by w. This allows us to 
reduce the form of i^{w) even further so that the condi- 
tioned variable is a reverse causal state. Continuing, 

iy{w)^Py{S+\X^^w) 

= Pi-{S+\S^ =e-{w)). 

Hence, we can calculate iJ[iS+|iS^] and so obtain E. 



CALCULATIONAL EXAMPLE 

To clarify the procedure, we apply it to the Random, 
Noisy Copy (RnC) Process. The emphasis is on the vari- 
ous process presentations and mixed states that are used 
to calculate the excess entropy. In the next section, ad- 
ditional examples are provided which skip over these cal- 
culational details and, instead, focus on the analysis and 
interpretation. 

The RnC generates a random bit with bias p. If that 
bit is a 0, it is copied so that the next output is also 
0. However, if the bit is a 1, then with probability q, 
the 1 is not copied and is output instead. The RnC 
Process is related to the binary asymmetric channel of 
communication theory (2^ . 

The forward e-machine has three recurrent causal 
states = {A,B,C} and is shown in Fig. [TJa). The 
transition matrices T'^' specify Pt{Xo — x,Si\S^) and 
are given by: 



where tt and T are the stationary distribution and tran- 
sition matrices of a machine that generates the process 



ABC 
A / p 
r(°^ = B i 1 
C\q 



and 



(a) M- 




7^(1) 



(One must explicitly calculate the equivalence classes of 
histories { a; } specified in Eq. ((T|) and their associated 
future conditional distributions Pr(X|V) to obtain the 
e-machine causal states and transitions.) 

These matrices are used calculate the stationary dis- 
tribution TT over the causal states, which is given by the 
left eigenvector of the stochastic matrix T = T'^^'> + T^^^: 



A B 



Pr(5+) = - ( 1 p 



C 
1-p 



Using the T^^^ and tt, we create the time-reversed presen- 
tation M+ =T{M+). This is shown in Fig.[ljb). Notice 
that the machine is not unifilar, and so it is clearly not an 
e-machine. The transition matrices for the time-reversed 
presentation are given by: 



and 







A 


B 


C 








P 




'f(O) 




(; 





: J 




C 













A 


B 


c 




A 


(o 







'f(l) 


= B 













C 











p) 



As with A/+, we calculate the stationary distribution of 
M+, denoted tt. However, we showed that the stationary 
distributions for M and T{M) are identical. 

Now we are in a position to calculate the mixed-state 
presentation, M~ = U{M~^), shown in Fig. [IJc). Gener- 
ally, causal states can be categorized into types [2^. Of 
these, the calculation of E depends only on the reachable 
recurrent causal states. The construction of the mixed- 
state presentation will generate other types of causal 
states, such as transient causal states, but we eventually 
remove them. 

To begin, we start with the empty word, w = A, and 
append and 1 to consider v{0) and respectively, 
and calculate: 

:/(0)=Pr(5+|Xo = 0) 



{p,p,q{^ -p)) 

2p + q{l-p) 




(b) AI 



?(1-P)|0 
(l-g)(l-p)|l 




(c) M- 




FIG. 1: The presentations used to calculate the excess en- 
tropy for the RnC Process; (a) M+, (b) M+ = T(M+), 
and (c) = U{M^). Edge labels t\x give the probability 



t = T' 



and 



of making a transition and seeing symbol x. 



Kl)=Pr(5o+|Xo = l) 



For each mixed state, we append Os and Is and calculate 
again: 



1/(00) = Pt{S^\X^ = 00) 
K01) = Pr(5+|X2 = 01) 
KlO) = Pr(5o+|X2 = 10) 
11) 



Kll)-Pr(5o+|X2 



-2^(0)^(0) 

if(i)r(i)i 



and 



Note that 



z.(10) 



t/(0)T(i) 



(32) 



This latter form is important in that it allows us to build 
mixed states from prior mixed states by prepending a 
symbol. 

One continues constructing mixed states of longer and 
longer words until no more new mixed states appear. As 
an example, z/(1001) = i/(111001) for the right-scanned 
RnC Process. 
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To illustrate calculating the transition probabilities, 
consider the transition from z^(OO) to j/(100) [s^]- By 
Eq. dni), we have 

Pr(l,:/(100)|zy(00)) =Pr(l|00) 

1-p 



1+p- 



pq 



After constructing the mixed-state presentation, one 
calculates the stationary state distribution. The causal 
states which have Pr(iS^) > arc the recurrent causal 
states. These are S = {D, E, F}: 



D = jy(lOOl) 
E = i^(lOO) = 
F = 1^(10) = 



ABC 
1 
ABC 
1 
A B 
P— 



c 



These mixed states give Pr(iS+ |iS ) which, when com- 
bined with Pr(5+), allows us to calculate: 



with 



and 



E^I[S-;S+] = C+ -x' 
H{p) 



X 



p^q{\ -p) 



H 



P 



2 "\p + q{l-p) 

where ff (•) is the binary entropy function. 

EXAMPLES 



With the calculational procedure laid out, wc now 
analyze the information processing properties of several 
examples — two of which are familiar from symbolic dy- 
namics. 



Even Process 

The Even Process is a stochastic generalization of the 
Even System: the canonical example of a sofic suhshift — 
a symbolic dynamical system that cannot be expressed as 
a subshift of finite type 3, 2^ . Although it has only two 
recurrent causal states, the Even Process cannot be ex- 
pressed as any finite Markov chain over measurement se- 
quences. Somewhat surprisingly, it turns out to be quite 



simple in terms of the properties we are addressing. As 
we will now show, the mapping between forward and re- 
verse causal states is one-to-one and so x = 0. All of its 
internal state information is present in measurements; we 
call it an explicit, or non-cryptic process. 

Its forward e-machine has two recurrent causal states 
S'^ = {A,B} and transition matrices (lit : 




and 



y(0) 



7^(1) ^ 



Figure [2l^a) gives M+, while [2jb) gives . We see that 
the e-machines are the same and so the Even Process is 
causally reversible (S = 0). Note that A/+ is unifilar. 

We can give general expressions for the information 
processing invariants as a function of the probability p = 
Pr(0|A) of the self-loop. A simple calculation shows that 



and 



A 


B 


1 


i-p 


2-p 


2-p 


c 


D 


1 


i-p 


2-p 


2-p 



And so, = if (1/(2 -p)) and h^, = H{p)/(2-p). 
Since x = for all p, we have E = C^. 



(a) M+ 



l-p\l 



(b) M- 



1|1 



(c) M± 



1|1 
-|l-p|l 



+ii|i 
-ii|i 

FIG. 2: Forward and reverse e-machines for the Even Process: 
(a) M+ and (b) M~ . (c) The bidirectional machine IvI"^ . 
Edge labels are prefixed by the scan direction { — ,-!-}. 

Now, let's analyze its bidirectional machine, which is 
shown in Fig. HJc). The reverse and forward maps are 
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given by: 



A B 




and 



C D 




Vv{S+\S-) = 



Pr(5-|5+) = 



From which one calculates that Pr(5±) = Pr{AC, BD) = 
(2/3,1/3) for p = 1/2. This and the switching maps 
above give C± = H[S^] = H{2/3) w 0.9183 bits and 
E = J[5+;5-] « 0.9183 bits. 

Direct inspection of A/+ and shows that both 
e- machines are reverse unifilar. And this is reflected in 
the fact that C+ = ~ E; verifying a proposition of 
Ref. M. 




0.4 0.6 
Probability p 



1.0 



FIG. 3: The Even Process's information processing 
properties — C^, C^, and x'^ — its self- loop probability p 
varies. The colored area bounded by the curves show the 
magnitude of E. 

Without going into details to be reported elsewhere, 
the Even Process is also notable since it is difficult to 
empirically estimate its E. (The convergence as a func- 
tion of the number of measurements is extremely slow.) 
Viewed in terms of the quantities C+ , , i X i a-i^d 
S, though, it is quite simple. This illustrates one strength 
of the time-symmetric analysis. The latter's new and in- 
dependent set of informational measures lead one to ex- 
plore new regions of process space (see Fig. [3|) and to 
ask structural questions not previously capable of being 
asked (or answered, for that matter) . To see exactly why 
the Even Process is so simple, let's look at its causal 
states. 



Its histories can be divided into two classes: those that 
end with an even number of Is and those that end with 
an odd number of Is. Similarly, its futures divide into 
two classes: those that begin with an even number of Is 
and those that begin with an odd number of Is. The 
analysis here shows that these classes are causal states 
A, B, C, and D, respectively; see Fig. [2l 

Beginning with a bi-infinite string, wherever we choose 
to split it into (X, X), we can be in one of only two situa- 
tions: either {A, C) or [B, D), where A (C) ends (begins) 
with an even number of Is, and B (D) ends (begins) with 
an odd number of Is. This one-to-one correspondence 
simultaneously implies causal reversibility (S = 0) and 
explicitness (x = 0). Thinking in terms of the bidirec- 
tional machine, we can predict and retrodict, changing 
direction as often as we like and forever maintain op- 
timal predictability and retrodictability. Since we can 
switch directions with no loss of information, there is no 
asymmetry in the loss; this reflects the process's causal 
reversibility. 

Plotting C+, C^, and x^, Fig- [3] rather directly il- 
lustrates these properties and shows that they are main- 
tained across the entire process family as the self-loop 
probability p is varied. 



Golden Mean Process 

The Golden Mean Process generates all binary se- 
quences except for those with two contiguous Os. Like the 
Even Process, it has two recurrent causal states. Unlike 
the Even Process, its support is a subshift of finite type; 
describable by a chain over three Markov states that cor- 
respond to the length-2 words 01, 10, and 11. Nominally, 
it is considered to be a very simple process. However, it 
reveals several surprising subtleties. and M~ are the 
same e-machine — it is causally reversible (S = 0). How- 
ever, has three states and the forward and reverse 
state maps are no longer the identity. Thus, x > and 
the Golden Mean Process is cryptic and so hides much 
of its state information from an observer. 

Its forward e-machine has two recurrent causal states 
= {A, B} and transition matrices [l^: 




and 




Figure ID^a) gives A/+, while (b) gives M~. We see that 
the e-machines are the same and so the Golden Mean 
Process is causally reversible (S = 0). 
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Again, we can give general expressions for the informa- 
tion processing invariants as a function of the probability 
p = Pr(l|A) of the self-loop. The state-to-state transition 
matrix is the same as that for the Even Process and we 
also have the same causal state probabilities. Thus, we 
have = iJ (1/(2 ~ p)) and /i^ = H[p)/{2 — p) again, 
just as for the Even Process above. Indeed, a quick com- 
parison of the state-transition diagrams does not reveal 
any overt difference with the Even Process's e-machines. 



(a) J1/+ 



l-p|0 



p\l 



(b) M- 



(c) M± 




+|1|0 



FIG. 4; Forward and reverse e-machines for the Golden Mean 
Process: (a) and (b) M~ . (c) The bidirectional machine 

However, since x 7^ for p G (0, 1) and since the 
process is also a one-dimensional spin chain, we have 



E = C„ 



Rhf^ with R 



E = 



1. (Rccah Eq. ®.) Thus, 



2-p 



H{p) 



P 



(33) 



Putting these closed-form expressions together gives us 
a graphical view of how the various information invari- 
ants change as the process's parameter is varied. This is 
shown in Fig. \S\ 

In contrast to the Even Process, the excess entropy 
is substantially less than the statistical complexities, the 
signature of a cryptic process: x = H{p) / {2 — p). 

The origin of its crypticity is found by analyzing the 
bidirectional machine, which is shown in Fig. WLs). The 
reverse and forward maps are given by: 

A B 

Pr(5+|5-)= ^ (P and 
^ \ 1 / 



Pr(5-|5"* 



C D 
Alp 1 - p 
B\ I 




0.4 0.6 
Probability p 



1.0 



FIG. 5: The Golden Mean Process's information processing 
invariants — C^, , and — bs its self-loop probability p 
varies. Colored areas bounded by the curves give the magni- 
tude at each p of x~ , E, and x^- 

From A/*, one can calculate the stationary distribu- 
tion over the bidirectional causal states: Pr(5^) = 
Pr{AC,AD,BC) = {p,l - p,l - p) / {2-p). Forp= 1/2, 
we obtain = H[S^] — log2 3 « 1.5850 bits, but an 
E = = 0.2516 bits. 



less that the C^s, a cryptic process: x 



Thus, E is substantially 
1.3334 bits. 

The Golden Mean Process is a perfect complement to 
the Even Process. Previously, it was viewed as a simple 
process for many reasons: It is based on a subshift of 
finite type and order-1 Markov, the causal-state process 
is itself a Golden Mean Process, it is microscopically re- 
versible, and E was exactly calculable (even before the 
introduction of the methods here). However, the preced- 
ing analysis shows that the Golden Mean Process displays 
a new feature that the Even Process does not — crypticity. 

We can gain an intuitive understanding of this by 
thinking about classes of histories and futures. In this 
case, a bi-infinite string can be split in three ways 
(X,X): (^,C), iA,D), or (B,C), where A (C) is any 
past (future) that ends (begins) with a and B (D) is 
any past (future) that ends (begins) with a 1 . In terms of 
the bidirectional machine, there is a cost associated with 
changing direction. It is the mixing among the causal 
states above that is responsible for this cost. Further, this 
cost is symmetric because of the microscopic reversibil- 
ity. Switching from prediction to retrodiction causes a 
loss of x"*" bits of memory and a generation of x~ bits of 
uncertainty. 

Each complete round-trip state switch (e.g., forward- 
backward-forward) leads to a geometric reduction in 
state knowledge of E2/(C+C^). One can characterize 
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this information loss with a half-life — the number of com- 
plete switches required to reduce state knowledge to half 
of its initial value. 

Figure [5] shows that these properties are maintained 
across the entire Golden Mean Process family, except 
at extremes. When p = 0, it degenerates to a simple 
period-2 process, with E = C+ = C~ = = 1 bit of 
memory. When p = 1, it is even simpler, the period-1 
process, with no memory. As it approaches this extreme, 
E vanishes rapidly, leaving processes with internal state 
memory dominated by crypticity: w -|- ^~ . 




1/2|1 

(c) M± (be 



Random Insertion Process 

Our final example is chosen to illustrate what appears 
to be the typical case — a cryptic, causally irreversible 
process. This is the random insertion process (RIP) 
which generates a random bit with bias p. If that bit 
is a 1, then it outputs another 1. If the random bit is 
a 0; however, it inserts another random bit with bias q, 
followed by a 1. 

Its forward e-machinc has three recurrent causal states 
= {A, B, C} and transition matrices: 
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B 


c 
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(0 


P 
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Ko 





I) 
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c 




(0 





1 - 
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■ 








1 - 


q 


C 













and 



Figure Eljb) shows AI~ which has four recurrent causal 
states S~ = {D, E, F,G}. Wc see that the e-machines 
arc not the same and so the RIP is causally irreversible. 
A direct calculation gives: 




FIG. 6: Forward and reverse e-machines for the RIP with p = 
5 = 1/2: (a) M+ and (b) M~. (c) The bidirectional machine 
also for p — q = 1/2. (Reprinted with permission from 
Ref. m.) 



forward maps are given by: 



Pr(5+|>S-) 



Pr(5"|5+) = 
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Pr(5+) = 
Pr(5-) = 



A 


B 


c 




1 

P+2 


p 

P+2 


^) 

P+2 J 


and 
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1 

P+2 


l-pq 
P+2 


pq 
p+2 


p 
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Or, for general p and g, we have 



Pr(5+,5-) = 



1 



{p + 2) 
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G 




fo 


1-p 





P 







p{i-q) 


pq 





c 














If p = g = 1/2, for example, these give us C+ w 1.5219 
bits, G^ w 1.8464 bits, and /i^ = 3/5 bits per measure- 
ment. The causal irreversibility is S « 0.3245 bits. 

Let's analyze the RIP bidirectional machine, which is 
shown in Fig. for p = q = 1/2. The reverse and 



By way of demonstrating the exact analysis now possible, 
E's closed-form expression for the RIP family is 



E = log2(p + 2)- 



p log2 p I - pq „ f I - P 



H 



P' 



P- 



1 - pq, 
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noii-cryptic, reversible 
semi-cryptic, irreversible 
cryptic, reversible 
cryptic, irreversible 



1 bit 



1 1 

q = 0.01 
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FIG. 7: The Random Insertion Process's information processing invariants as its two probability parameters p and q vary. The 
central square shows the (p, q) parameter space, with solid and dashed lines indicating the paths in parameter space for each 
of the other information versus parameter plots. The latter's vertical axes are scaled so that two tick marks measure 1 bit of 
information. The inset legend indicates the class of process illustrated by the paths. Colored areas give the magnitude of x~, 
E, and . 



The first two terms on the RHS are C.t and the last is 

Setting p ^ q ^ 1/2, one calculates that Pr(iS*) = 
Fv{AE,AG,BE,BF,CD) = (1/5,1/5,1/10,1/10,2/5). 
This and the joint distribution give — HlS"^] ~ 
2.1219 bits, but an E = = 1.2464 bits. That 

is, the excess entropy (the apparent information) is sub- 
stantially less than the statistical complexities (stored 
information) — a moderately cryptic process: x ~ 0.8755 
bits. 

Figure [7| shows how the RIP's informational charac- 
ter varies along one-dimensional paths in its parame- 
ter space: {p,q) G [0,1]^- The four extreme-p and -q 
paths illustrate that the RIP borders on (i) noncryp- 
tic, reversible processes (solid line), (ii) semi-cryptic, ir- 
reversible processes (long dash), (iii) cryptic, reversible 
processes (short dash), and (iv) cryptic, irreversible pro- 
cesses (very short dash). The horizontal path {q = 0.5) 
and two diagonal paths {p ~ q and p = I — q) show 



the typical cases within the parameter space of cryptic, 
irreversible processes. 



CONCLUSIONS 

Casting stochastic dynamical systems in a time- 
agnostic framework revealed a landscape that quickly led 
one away from familiar entrances, along new and unfa- 
miliar pathways. Old informational quantities were put 
in a new light, new relationships among them appeared, 
and explicit calculation methods became available. The 
most unexpected appearances, though, were the new in- 
formational invariants that emerged and captured novel 
properties of general processes. 

Excess entropy, a familiar quantity in a long-applied 
family of mutual informations, is often estimated [3, 0, 
i, S, 0, S, i, [13, El [13, [ii and is broadly considered an 
important information measure for organization in com- 
plex systems. The exact analysis afforded by our time- 
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agnostic framework gave an important calibration in our 
studies. Specifically, it showed how difficult accurate es- 
timates of the excess entropy can be. While we intend 
to report on this in some detail elsewhere, suffice it to 
say that the convergence of empirical estimates of E, in 
even very benign (and low statistical complexity) cases, 
can be so slow as to make estimation computationally 
intractable. This problem would never have been clear 
without the closed- form expressions. It, with nothing 
else said, calls into doubt many of the reported uses and 
estimations of excess entropy and related mutual infor- 
mation measures. 

Fortunately, we now have access to the analytic cal- 
culation of the excess entropy from the e-machine. Note 
that the latter is no more difficult to estimate than, say, 
estimating the entropy rate of an information source. 
(Both are dominated by obtaining accurate estimates of 
a process's sequence distribution.) Notably, the calcu- 
lation relied on connecting prediction and retrodiction, 
which we accomplished via the composition of the time- 
reversal operation on e-machines and the mixed-state- 
presentation algorithm. As the analyses of the various ex- 
ample processes illustrated, the technique yields closed- 
form expressions for E. More generally, though, the ex- 
plicit relationship between a process's e-machine and its 
excess entropy clearly demonstrates why the statistical 
complexity, and not the excess entropy, is the informa- 
tion stored in the present. 

In addition to the analytical advantage of having E 
in hand, we learned a pointed lesson about the differ- 
ence between prediction (reflected in E) and modeling 
(reflected in C^J. In particular, a system's causal rep- 
resentation yields more direct access to fundamental in- 
variants than others — such as, histograms of word counts 
or general hidden Markov models. The differences be- 
tween prediction and modeling unearthed new informa- 
tional quantities — crypticity and causal irreversibility. 

Crypticity describes the amount of stored state infor- 
mation that is not shared in the measurement sequence. 
One might think of this as "wasted" information, al- 
though the minimality of the e-machine suggests that this 
waste is necessary — that is, an intrinsic property of the 
process. Possibly we could better think of this as model- 
ing overhead. 

When analyzing time symmetry, one can use notions 
such as microscopic reversibility or, more broadly, re- 
versible support. We introduced the yet-broader notion 
of causal irreversibility S. It has the advantage of being 
scalar rather than Boolean and so has something to say 
quantitatively about all processes. Also, it derives nat- 
urally from its simple relationship to E and x- Ii^ ttiis 
light, microscopic reversibility appears to be too strong 
a criterion, missing important structural properties. 

The time-agnostic perspective hinged on expanding 
the space of representations. First, we described par- 
allel predictive and retrodictive causal models joined by 



the switching maps. We then introduced a bidirectional 
machine that compressed C+ and C~ into . The 
associated joint causal-state space allowed us to make 
rather nonintuitive statements about prediction (retrod- 
iction) conditioned on these joint states. The operational 
meaning of the bidirectional machine certainly warrants 
further attention. It also seems likely that its nonunifi- 
larity has not yet been fully appreciated. One might 
wish to consider, for example, a unifilar representation 
of it. Somewhat hopefully, we end by noting that the 
bidirectional machine suggests an extension of e-machine 
analysis beyond one-dimensional processes. 
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APPENDIX: THE MIXED-STATE 
PRESENTATION IS SUFFICIENT TO 
CALCULATE THE SWITCHING MAPS 

While we conjecture that the mixed-state operation 
IA{M~^) yields an e-machine, this remains an open prob- 
lem. Our conjecture, however, is based on a rather large 
number of test cases in which it is an e-machine. j^ortu- 
nately for our present needs, we can show that U{M^) is 
sufficient for calculating the conditional probability dis- 
tribution Pr(5+|5-). 

For a moment, ignore the details of forward and reverse 
machines and simply consider machines A and B such 
that U{A) = B where neither A nor B is necessarily 
an e-machine. We would like to learn the conditional 
probability distribution 'Pr{Ti,A\R-B), where TZa and TZb 
arc j4's and B's states, respectively. 

Proposition 11. B's states are mixed states of A. 

Proof. We use the mixed-state presentation algorithm to 
form states based on the transition matrices of A. If a 
state TZb is induced by a word w, then: 

We now show that B is deterministic. 

Proposition 12. H[R'\TZ, X] = for machine B. 

Proof. Although any given state in B will generally be a 
distribution over states in A, each of these distributions 
defines a state of B. The particular state of B (or distri- 
bution over states in A), TZ' , that follows TZ and X can 
be written: 

_ ttaTXT^ 
'^^ ttaTXT^V ■ 
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So, by construction, B is deterministic. 
Moreover, TZb is a refinement oi Sb- 



□ 



Proposition 13. Two pasts that induce the same state 
in B must be pasts in the same causal state of B 's 
e-machine. 

Proof. The future probability distribution given a word is 
exactly the future probability distribution given the mixed 
state induced by that word: 



Pr(X|w) 



Pr(X|MM) 



ttT'^T^ 



nT'^T^ 



T 



X 



■nT^T^ri 



Therefore, if two words induce the same mixed state, the 
future probability distribution conditioned on those words 
are the same. This means that those words are causally 
equivalent and thus in the same causal state. □ 

Now we show how, even in this very generic case, we 
can calculate the relevant conditional probability distri- 
bution. 

The mixed-state construction of B implicitly has given 
us Pr(7?.yi|7?.B), which we can use to find Pr(7?.^|5B), our 
goal: 



Pr(7^A|5s) = ^Pr(7^A|5B,7^B)Pr(7^s|5B) 
= ^Pr(7^A|7^B)Pr(7^s|5B) 



■Re 



Pr(7ei 



5^Pr(7^^|7^B)Pr(5B|7^B)^ 
^Pr(7^^|7^B)<57^.e5«^^ 



^ Pr(5-KB ) 



The second line follows since T?.^ is a refinement of tS^ . 
The third line is an application of Bayes Rule. The fourth 
line follows again from the refinement. The final form 
reminds us that «Sb is not a free variable. 

To sum up, we calculate the conditional distribution 
using this final form as follows. The first factor is found 
by applying U to A. Granting ourselves the ability to 
ascertain predictive equality among a finite set of states 
'R-B, we determine if TZb G Sb for each TZb- Lastly, 
we compute the stationary distribution over the states of 
B and divide by the stationary probability of the corre- 
sponding causal state. 

In effect, this establishes a general method for com- 
puting the conditional probability of states from the "in- 
put" machine given a state of the "resultant" machine. 



We can now recall the specific context of forward and 
reverse e-machines and apply this technique to calculate 
E in the case where the resultant machine T(M+) is not 
an e-machine. 

The input machine is the reversed e-machine T(A/+), 
whose states «S+ are in one-to-one correspondence with 
. Thus, the previous result: 



Pr(7^^|5B) = EPr(7ZA|7ZB) 



■Re 



Pr(^ 
Pr(57j, 



now becomes: 

Pr(5A|5s) = 

or, more specifically. 



Y,Fr{SA\nB) 



■Re 



Pr(7^a) 



Pr(5+|5-) = EPr(5+|7^i3) 



Pr(7^B) 

From which we readily calculate E using: 

E = /[5+;5-] 

= H[S+]-H[S+\S-] . 
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Throughout, we follow the notation and definitions of 

Refs. 0, [121. In addition, when we say X, for example, 
this should be interpreted as a shorthand for using 
and then taking an appropriate limit, such as liniL^oo or 
lim^^oo i/L. 

A process's causal states consist of both transient and 
recurrent states. To simplify the presentation, we hence- 
forth refer only to recurrent causal states that are dis- 
crete. 

Following terminology in computation theory this is re- 
ferred to as determinism [33|. However, to reduce confu- 
sion, here we adopt the practice in information theory to 
call it the unifilarity of a process's representation [3ll |. 
Specifically, the transition matrices have at most one 
nonzero component in each row. 
Interpret the symbol ± as "plus and minus". 
This calculation gives the probability of transitioning 
from a transient causal state to a recurrent causal state 
on seeing 1. 



